AITopics | planning and reasoning

Collaborating Authors

planning and reasoning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change

Neural Information Processing SystemsDec-26-2025, 04:58:23 GMT

Generating plans of action, and reasoning about change have long been considered a core competence of intelligent agents. It is thus no surprise that evaluating the planning and reasoning capabilities of large language models (LLMs) has become a hot topic of research. Most claims about LLM planning capabilities are however based on common sense tasks-where it becomes hard to tell whether LLMs are planning or merely retrieving from their vast world knowledge. There is a strong need for systematic and extensible planning benchmarks with sufficient diversity to evaluate whether LLMs have innate planning capabilities. Motivated by this, we propose PlanBench, an extensible benchmark suite based on the kinds of domains used in the automated planning community, especially in the International Planning Competition, to test the capabilities of LLMs in planning or reasoning about actions and change. PlanBench provides sufficient diversity in both the task domains and the specific planning capabilities. Our studies also show that on many critical capabilities-including plan generation-LLM performance falls quite short, even with the SOTA models. PlanBench can thus function as a useful marker of progress of LLMs in planning and reasoning.

extensible benchmark, planbench, planning and reasoning, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change

Neural Information Processing SystemsJan-19-2025, 09:32:21 GMT

Generating plans of action, and reasoning about change have long been considered a core competence of intelligent agents. It is thus no surprise that evaluating the planning and reasoning capabilities of large language models (LLMs) has become a hot topic of research. Most claims about LLM planning capabilities are however based on common sense tasks–where it becomes hard to tell whether LLMs are planning or merely retrieving from their vast world knowledge. There is a strong need for systematic and extensible planning benchmarks with sufficient diversity to evaluate whether LLMs have innate planning capabilities. Motivated by this, we propose PlanBench, an extensible benchmark suite based on the kinds of domains used in the automated planning community, especially in the International Planning Competition, to test the capabilities of LLMs in planning or reasoning about actions and change.

extensible benchmark, planbench, planning and reasoning, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Probabilistic World Modeling with Asymmetric Distance Measure

Song, Meng

arXiv.org Artificial IntelligenceMar-16-2024

Representation learning is a fundamental task in machine learning, aiming at uncovering structures from data to facilitate subsequent tasks. However, what is a good representation for planning and reasoning in a stochastic world remains an open problem. In this work, we posit that learning a distance function is essential to allow planning and reasoning in the representation space. We show that a geometric abstraction of the probabilistic world dynamics can be embedded into the representation space through asymmetric contrastive learning. Unlike previous approaches that focus on learning mutual similarity or compatibility measures, we instead learn an asymmetric similarity function that reflects the state reachability and allows multi-way probabilistic inference. Moreover, by conditioning on a common reference state (e.g. the observer's current state), the learned representation space allows us to discover the geometrically salient states that only a handful of paths can lead through. These states can naturally serve as subgoals to break down long-horizon planning tasks. We evaluate our method in gridworld environments with various layouts and demonstrate its effectiveness in discovering the subgoals.

probabilistic world modeling, reachability, representation, (12 more...)

arXiv.org Artificial Intelligence

2403.10875

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Can Large Language Models Reason and Plan?

Kambhampati, Subbarao

arXiv.org Artificial IntelligenceMar-8-2024

A version appears in the Annals of The New York Academy of Sciences: https://nyaspubs.onlinelibrary.wiley.com/doi/10.1111/nyas.15125 Their seeming versatility has however led many researchers to wonder whether they can also do well on planning and reasoning tasks typically associated with System 2 competency. Nothing in the training and use of LLMs would seem to suggest remotely that they can do any type of principled reasoning (which, as we know, often involves computationally Despite this, the "Large Language Models are Zero-Shot hard inference/search). What LLMs are good insert-your-reasoning-task " has almost become a meme at is a form of universal approximate retrieval. This means that LLMs can't even So, are these n-gram models on steroids really capable of guarantee memorizing complete answers, something that planning and reasoning?

knowledge, llm, subbarao kambhampati, (8 more...)

arXiv.org Artificial Intelligence

doi: 10.1111/nyas.15125

2403.04121

Country:

North America > United States > New York (0.24)
North America > United States > Arizona (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

[2206.10498] Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change)

#artificialintelligenceNov-1-2022, 00:10:38 GMT

Recent advances in large language models (LLMs) have transformed the field of natural language processing (NLP). From GPT-3 to PaLM, the state-of-the-art performance on natural language tasks is being pushed forward with every new large language model. Along with natural language abilities, there has been a significant interest in understanding whether such models exhibit reasoning capabilities with the use of reasoning benchmarks. However, even though results are seemingly positive, these benchmarks prove to be simplistic in nature and the performance of LLMs on these benchmarks cannot be used as evidence to support, many a times outlandish, claims being made about LLMs' reasoning capabilities. Further, these only represent a very limited set of simple reasoning tasks and we need to look at more sophisticated reasoning problems if we are to measure the true limits of such LLM-based systems. Motivated by this, we propose an extensible assessment framework to test the capabilities of LLMs on reasoning about actions and change, a central aspect of human intelligence. We provide multiple test cases that are more involved than any of the previously established benchmarks and each test case evaluates a different aspect of reasoning about actions and change. Results on GPT-3 (davinci), Instruct-GPT3 (text-davinci-002) and BLOOM (176B), showcase subpar performance on such reasoning tasks.

llm, planning and reasoning

#artificialintelligence

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Add feedback